Non-linear maximum likelihood feature transformation for speech recognition

نویسندگان

  • Mohamed Kamal Omar
  • Mark Hasegawa-Johnson
چکیده

Most automatic speech recognition (ASR) systems use Hidden Markov model (HMM) with a diagonal-covariance Gaussian mixture model for the state-conditional probability density function. The diagonal-covariance Gaussian mixture can model discrete sources of variability like speaker variations, gender variations, or local dialect, but can not model continuous types of variability that account for correlation between the elements of the feature vector. In this paper, we present a transformation of the acoustic feature vector that minimizes an empirical estimate of the relative entropy between the likelihood based on the diagonal-covariance Gaussian mixture HMM model and the true likelihood. Based on this formulation, we provide a solution to the problem using volume-preserving maps; existing linear feature transform designs are shown to be special cases of the proposed solution. Since most of the acoustic features used in ASR are not linear functions of the sources of correlation in the speech signal, we use a non-linear transformation of the features to minimize this objective function. We describe an iterative algorithm to estimate the parameters of both the volume-preserving feature transformation and the HMM that jointly optimize the objective function for an HMM-based speech recognizer. Using this algorithm, we achieved 2% improvement in phoneme recognition accuracy compared to the baseline system. Our approach shows also improvement in recognition accuracy compared to previous linear approaches like linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT), and independent component analysis (ICA).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generalized discriminative feature transformation for speech recognition

We propose a new algorithm called Generalized Discriminative Feature Transformation (GDFT) for acoustic models in speech recognition. GDFT is based on Lagrange relaxation on a transformed optimization problem. We show that the existing discriminative feature transformation methods like feature space MMI/MPE (fMMI/MPE), region dependent linear transformation (RDLT), and a non-discriminative feat...

متن کامل

Large vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model

This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This paper shows the first experimental evi...

متن کامل

Maxium Likelihood Non-linear Transformation for Environment Adaptation in Speech Recognition Systems

In this paper, we describe an adaptation method for speech recognition systems that is based on a piecewise-linear approximation to a non-linear transformation of the feature space. The method extends a previously proposed non-linear transformation (NLT) technique by making the transformation function more sophisticated (piecewise-linear instead of piecewiseconstant), and by computing the trans...

متن کامل

Review on Heteroscedastic Discriminant Analysis

Discriminant feature spaces are attractive way to improve the word error rate performance of the speech recognition systems. Heteroscedastic discriminant analysis (HDA) is a generalized method for the feature space transformation that does not impose the equa l w i th in c l a s s cova r i ance assumptions required by the standard linear discriminant analysis (LDA). It will be shown that the co...

متن کامل

Maximum Likelihood Lineartransformations for Hmm

This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear feature-space transformations are inappropriate in this case. Hence, only model-b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003